Knowledge Discovery Using Genetic Programming with Rough Set Evaluation

نویسندگان

  • David H. Foster
  • W. James Bishop
  • Scott A. King
  • Jack Park
چکیده

An important area of KDD research involves development of techniques which transform raw data into forms more useful for prediction or explanation. We present an approach to automating the search for "indicator functions" which mediate such transformations. The fitness of a function is measured as its contribution to discerning different classes of data. Genetic programming techniques are applied to the search for and improvement of the programs which make up these functions. Rough set theory is used to evaluate the fitness of functions. Rough set theory provides a unique evaluator in that it allows the fitness of each function to depend on the combined performance of a population of functions. This is desirable in applications which need a population of programs that perform well in concert and contrasts with traditional genetic programming applications which have as there goal to find a single program which performs well. This approach has been applied to a small database of iris flowers with the goal of learning to predict the species of the flower given the values of four iris attributes and to a larger breast cancer database with the goal of predicting whether remission will occur within a five year period. Introduction An important area of KDD research involves development of techniques which transform raw data into forms more useful for prediction or explanation. We present an approach to automating the search for "indicator functions" which mediate such transformations. The fitness of a function is measured as its contribution to discerning different classes of data. Genetic programming techniques are applied to searching for and improving the programs which make up these functions. Rough set theory is used to evaluate the fitness of functions. Rough set theory provides a unique evaluator in that it allows the fitness of each function to depend on the combined performance of a population of functions. This is desirable in applications which need a population of programs that perform well in concert and contrasts with traditional genetic programming applications which have as there goal to find a single program which performs well. This approach has been applied to a small database of iris flowers with the goal of learning to predict the species of the flower given the values of four iris attributes (Fisher’s iris data reproduced in Salzberg, 1990) and to a larger breast cancer database (breast cancer data reproduced in Salzberg, 1990) with the goal of predicting whether remission will occur within a five year period. The process begins by applying a population of randomly-generated programs to elements of a database. Program results are placed in a matrix and evaluated to obtain a measure of each program’s fitness. These fitness values are used to determine which programs will be kept and used in breeding the next generation. This process continues for a specified number of generations and is illustrated in Figure 1. Page 254 Knowledge Discovery in Databases Workshop I993 AAAI-93 From: AAAI Technical Report WS-93-02. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. I Population of programs P = {pl,p2,...,pn} Reproduction Crossover Mutation Execution Knowledge Base Application Data: Program primitives Attribute value matrix P x Data Significance of programs Minimal program sets I =@ I Figure 1: Genetic Programming Rank ordered programs

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Rough Set Theory in Data Mining for Decision Support Systems (DSSs)

Decision support systems (DSSs) are prevalent information systems for decision making in many competitive business environments. In a DSS, decision making process is intimately related to some factors which determine the quality of information systems and their related products. Traditional approaches to data analysis usually cannot be implemented in sophisticated Companies, where managers ne...

متن کامل

A Novel Methodology for Database Knowledge Discovery

This paper presents the rough set and genetic algorithms application to knowledge discovery in databases (RSGAKD). The purpose of the methodology is to use specified data for knowledge extraction from computer security logs. The methodology is outlined in terms of its objectives, scope, constraints, assumptions, and tools. The framework introduces rough set based knowledge approach. Where appro...

متن کامل

Learning in Relational Databases: A Rough Set Approach

Knowledge discovery in databases, or data mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriente...

متن کامل

Study on multi-objective nonlinear programming in optimization of the rough interval constraints

This paper deals with multi- objective nonlinear programming problem having rough intervals in the constraints. The problem is approached by taking maximum value range and minimum value range inequalities as constraints conditions, reduces it into two classical multi-objective nonlinear programming problems, called lower and upper approximation problems.  All of the lower and upper approximatio...

متن کامل

Heuristic Knowledge Discovery 1 Running head: Heuristic Knowledge Discovery, Genetic Algorithms and Rough Sets Heuristic Knowledge Discovery for Archaeological Data Using Genetic Algorithms and Rough Sets

The goal for of this research is to investigate and develop heuristic tools in order to extract meaningful knowledge from archeological large-scale data sets. Database queries help us to answer only simple questions. Intelligent search tools integrate heuristics with knowledge discovery tools and they use data to build models of the real world. We would like to investigate these tools and combi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002